Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation
نویسنده
چکیده
We describe an approach to semiautomatic lexicon development from |nachine readal)le dictionaries with specific reference to verbal diatlleses, envisaging ways in which tile results obtained can be used to guide word classification in the constrnction of dictionary datal)ases. 1 I n t r o d u c t i o n The acquisition and representation of lexical knowledge from machine-readable dictionaries and text corpora have increasingly become major concerns in Computational Lexicography/Lexicology. While this trend was essentially set by the need to mmximize costeffectiveness in building large scale Lexical Knowledge Bases for NLP (LKBs), there is a clear sense in which the construction of such knowledge I)ascs also caters to the demand for better dictionaries. Currently available dictionaries and thesauri provide an undoubtedly rich source of lexical information, but often omit or neglect to make explicit salient syntactic and semantic properties of word entries. For exa|nplc, it is well known that the same verb sense can appear in a wtriety of snl)categorization frames which can be related to one at|other through valency alternations (diatheses). Some dictionaries provide subcategorization information by means of grammar codes, as shown below for the "sail" sense. of the verb dock in LI)OCE Longman's Dictionary of Contemporary English (Procter, 1978). (1) a , , , : k "| , [ T l ; m : ( " 0 ] . . . . The codes [T1;10:(at)] indicate that the vcrl) can bc either transitive or intransitive with the possible a(Idition of all oblique colnpienlent introduced by l.he preposition at: (2) a. [T1 (at)]: Kim docked his ship (at Clasgow) b. [IO (at)l: The ship docked (at Glasgow) Unfortunately, an indication of diatheses which relate the various occurrences of tt,e verb to one another is rarely provided. Consequently, if we were to use the grammar code information found in M)OCE to create verb entries in an I,I(B by automatic conversion we would construct four seemingly vnrelated entries for the verb dock (see §3). Inadequacies of this kind may be redressed through semiantomatie techniques *The researcl, relmrted in this paper was carried out within the ACQUILFX project. I a tn indebted to Ted Briscoe, Ann Col)estake and Pete Whitek)ck for helpful comments. wl|ich make it possil)le to suplfly infornmtion concerning amenability to diathesis alternations so ~tq to avoid expanding distinct entries for related uses of the same verb. This practice woldd allow us to develop an I,KB from dictionary databases which offers a more co|nplate and linguistically relined repository of lexical information l, hall the source databases. Such an ],Kll wouhl be used to generate lexical components for NI,P systems, and couhl also be integrated into a lexicographer's workstation to guide word classification. 2 T h e A C Q U I L E X L e x i c o n D e v e l o p m e n t E n v i r o n n m n t Our points of departure are tile tools for lexical acquisition and knowledge representation (lew~loped iL~ part of the ACQUII,I'3X project ( 'The Acquisition of Lcxieal Knowledge for NLP Systems'). The ACQUILI'~X l,exicon l)evelopment Environmen| uses typed graph unilication with inheritance as its lexical representation htnguage (for details, see Copestake (1992), Sanfiliplm & l'oznafiski (1992), and pal)ers by Copestake, de Paiva and Sanfilippo in Briscoe el al. (1993)). It; allows the user to define an inheritance hierarchy of types with associated restrictions expressed in terms of attril)ute-wdue [)airs as shown in Fig 1, and to create lexicons where such types are used to create lexical templates which encode word-se,|se specific information ex{.racte.d from MRI)s st, ch as the one in Fig 2. (Bold lowerc~me is used for types, caps for attributes, and boxes enclosing types indicate total omission of attribute-vahm pairs, l)etails COIICel'llillg, | l i e Ol lcOdi l lg o [ vet']) Sylll, aX an(I s e u l a n t i c s can be found in Sanlilil)po (1993).) Feature Structure (I"S) descriptions of word senses such as tilt.' one in Fig 2 are created semiautomatically through a program which converts syntactie an(I
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملWord Knowledge Acquisition for Computational Lexicon Construction
The growing of multilingual information processing technology has created the need of linguistic resources, especially lexical database. Many attempts were put to alter the traditional dictionary to computational dictionary, or widely named as computational lexicon. TCL’s Computational Lexicon (TCLLEX) is a recent development of a large-scale Thai Lexicon, which aims to serve as a fundamental l...
متن کاملA Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis
This paper describes a case-based approach to knowledge acquisition for natural language systems that simultaneously learns part of speech, word sense, and concept activation knowledge for all open class words in a corpus. The parser begins with a lexicon of function words and creates a case base of context-sensitive word definitions during a humansupervised training phase. Then, given an unkno...
متن کاملDevelopment of Myanmar-English Bilingual WordNet like Lexicon
A bilingual concept lexicon is of significance for Information Extraction (IE), Machine Translation (MT), Word Sense Disambiguation (WSD) and the like. Myanmar-English Bilingual WordNet like Lexicon (MEBWL) is developed to fulfill the requirements of Language Acquisition (LA). However, it is reasonably difficult to build such a lexicon is quite challenging in time and cost consuming. To overcom...
متن کاملLexical Knowledge Acquisition from Corpora
The paper presents a computational environment to support developing a lexicon for natural language processing. The underlying idea of the environment is to utilize up-to-date language technologies to minimize both the human labor and the inconsistency that are unavoidable in manual compilation of a lexicon. The proposed computational environment enables an efcient construction of a consistent ...
متن کاملAcquisition Of Computational-Semantic Lexicons From Machine Readable Lexical Resources
This paper describes a heuristic algorithm capable of automatically assigning a label to each of the senses in a machine readable dictionary (MRD) for the purpose of acquiring a computational-semantic lexicon for treatment of lexical ambiguity. Including these labels in the MRD-based lexical database offers several positive effects. The labels can be used as a coarser sense division so unnecess...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994